feat: Implement Scrapy HTTP cache backend by honzajavorek · Pull Request #403 · apify/apify-sdk-python

honzajavorek · 2025-02-14T15:50:57Z

I successfully use this code in my project. In 381c044 I explicitly specify licensing. I didn't add docs or anything else (yet), just made sure the code passes linters and a type check.

Relates: apify/actor-templates#303

vdusek

To make this work by default, also adjust the apify.scrapy.utils.apply_apify_settings. If this allows us to handle platform migration, we should turn caching by default. Also please add a reasonable comments there.

src/apify/scrapy/cache.py

honzajavorek · 2025-02-18T15:00:27Z

Thanks for the review! I considered this to be a kick off which will definitely get a review with changes requested. If #404 lets me, I'll work on addressing the comments in the upcoming days.

honzajavorek · 2025-02-18T19:54:47Z

I bet I've seen comments about adding this to Apify settings by default, but I can't see them now. I think it's a good idea to set the cache storage by default, but I wouldn't turn on caching by default, because it's application-specific how caching should work exactly (e.g. expiration time). We should just document somewhere that this is a preferred solution to handling the forced restarts, if they happen.

vdusek · 2025-02-19T10:47:07Z

I bet I've seen comments about adding this to Apify settings by default, but I can't see them now. I think it's a good idea to set the cache storage by default, but I wouldn't turn on caching by default, because it's application-specific how caching should work exactly (e.g. expiration time). We should just document somewhere that this is a preferred solution to handling the forced restarts, if they happen.

Agreed, thanks 🙂.

honzajavorek · 2025-03-10T13:15:57Z

Just for the record, the comment I've seen didn't get lost, it's the final comment of the review 😄

honzajavorek · 2025-03-10T14:20:32Z

I think I'm done for now!

docs/02_guides/05_scrapy.mdx

This code has been originally developed for the https://github.com/juniorguru/plucker/ project, which is licensed under AGPL-3.0-only. I am the sole author of that code and hereby I grant the https://github.com/apify/apify-sdk-python/ project the right to use it under an Apache-2.0 license, without the copyleft taking effect. My intention is to contribute this code to the upstream, remove it from my project, and only import the component from the apify package as a dependency, as I believe this component could be useful to other users of the apify package.

This error was introduced when I tried to make all the linters happy, uh.

src/apify/scrapy/extensions/_httpcache.py

vdusek

LGTM, thanks

honzajavorek · 2025-03-18T22:13:37Z

honzajavorek changed the title ~~Implement Scrapy HTTP cache backend~~ feat: Implement Scrapy HTTP cache backend Feb 14, 2025

honzajavorek temporarily deployed to fork-pr-integration-tests February 14, 2025 17:01 — with GitHub Actions Inactive

This was referenced Feb 14, 2025

Create Pull Request to Apify SDK with cache juniorguru/plucker#112

Closed

Scrapy scheduler emits timeout errors #404

Closed

vdusek requested changes Feb 18, 2025

View reviewed changes

vdusek added t-tooling Issues with this label are in the ownership of the tooling team. enhancement New feature or request. labels Feb 18, 2025

vdusek added this to the 108th sprint - Tooling team milestone Feb 18, 2025

honzajavorek force-pushed the honzajavorek/cache branch from 92f6435 to 3a2cc13 Compare March 10, 2025 13:27

honzajavorek requested a review from vdusek March 10, 2025 14:20

vdusek reviewed Mar 11, 2025

View reviewed changes

docs/02_guides/05_scrapy.mdx Show resolved Hide resolved

vdusek mentioned this pull request Mar 13, 2025

Execution of integration tests from forks doesn't work #429

Open

honzajavorek added 13 commits March 14, 2025 11:47

style: format code

96f1142

fix: make linter happy

40076f3

fix: return back nested syntax

eb5b73a

This error was introduced when I tried to make all the linters happy, uh.

feat: introduce the extensions package

ece55b1

refactor: make linter happy

468d3d2

fix: don't use public properties

48ee026

fix: rename module to httpcache and fix tests location

f6701db

docs: improve docstring of ApifyCacheStorage, describe usage

21ce5fc

refactor: rename kv to kvs per convention

c19a93e

feat: set HTTPCACHE_STORAGE in apply_apify_settings, document usage

d8adf62

docs: improve stylistics

1b02e45

docs: document workaround for apify/actor-templates#303

b6969ba

honzajavorek added 2 commits March 14, 2025 11:47

docs: improve stylistics

b40d555

fix: change public path of the ApifyCacheStorage class

d07cbe1

honzajavorek force-pushed the honzajavorek/cache branch from fa49081 to d07cbe1 Compare March 14, 2025 10:47

honzajavorek requested a review from vdusek March 14, 2025 11:05

This comment was marked as resolved.

Sign in to view

vdusek reviewed Mar 14, 2025

View reviewed changes

src/apify/scrapy/extensions/_httpcache.py Outdated Show resolved Hide resolved

honzajavorek temporarily deployed to fork-pr-integration-tests March 14, 2025 11:29 — with GitHub Actions Inactive

Update run_code_checks.yaml

41af4df

vdusek temporarily deployed to fork-pr-integration-tests March 14, 2025 11:52 — with GitHub Actions Inactive

feat: support wider variety of spider names

4ff56d7

vdusek reviewed Mar 18, 2025

View reviewed changes

src/apify/scrapy/extensions/_httpcache.py Show resolved Hide resolved

honzajavorek temporarily deployed to fork-pr-integration-tests March 18, 2025 13:13 — with GitHub Actions Inactive

fix: truncate too long kvs names

22af7ac

honzajavorek temporarily deployed to fork-pr-integration-tests March 18, 2025 13:56 — with GitHub Actions Inactive

honzajavorek temporarily deployed to fork-pr-integration-tests March 18, 2025 13:59 — with GitHub Actions Inactive

vdusek approved these changes Mar 18, 2025

View reviewed changes

vdusek merged commit 137e3c8 into apify:master Mar 18, 2025
25 of 27 checks passed

vdusek mentioned this pull request Mar 18, 2025

Scrapy template doesn't handle imminent migration to another host apify/actor-templates#303

Closed

honzajavorek deleted the honzajavorek/cache branch March 18, 2025 22:13

Comments

Conversation

honzajavorek commented Feb 14, 2025 • edited by vdusek Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vdusek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

honzajavorek commented Feb 18, 2025

Uh oh!

honzajavorek commented Feb 18, 2025

Uh oh!

vdusek commented Feb 19, 2025

Uh oh!

honzajavorek commented Mar 10, 2025

Uh oh!

honzajavorek commented Mar 10, 2025

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Uh oh!

vdusek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

honzajavorek commented Mar 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

honzajavorek commented Feb 14, 2025 •

edited by vdusek

Loading